Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Assunto da revista
País de afiliação
Intervalo de ano de publicação
1.
J Biomed Inform ; 42(5): 950-66, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19535011

RESUMO

In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Prontuários Médicos , Processamento de Linguagem Natural , Semântica , Indexação e Redação de Resumos , Pesquisa Biomédica , Guias como Assunto , Humanos , Internet , Modelos Estatísticos , Neoplasias , Terminologia como Assunto , Interface Usuário-Computador
2.
BMC Bioinformatics ; 9 Suppl 11: S3, 2008 Nov 19.
Artigo em Inglês | MEDLINE | ID: mdl-19025689

RESUMO

BACKGROUND: The Clinical E-Science Framework (CLEF) project has built a system to extract clinically significant information from the textual component of medical records in order to support clinical research, evidence-based healthcare and genotype-meets-phenotype informatics. One part of this system is the identification of relationships between clinically important entities in the text. Typical approaches to relationship extraction in this domain have used full parses, domain-specific grammars, and large knowledge bases encoding domain knowledge. In other areas of biomedical NLP, statistical machine learning (ML) approaches are now routinely applied to relationship extraction. We report on the novel application of these statistical techniques to the extraction of clinical relationships. RESULTS: We have designed and implemented an ML-based system for relation extraction, using support vector machines, and trained and tested it on a corpus of oncology narratives hand-annotated with clinically important relationships. Over a class of seven relation types, the system achieves an average F1 score of 72%, only slightly behind an indicative measure of human inter annotator agreement on the same task. We investigate the effectiveness of different features for this task, how extraction performance varies between inter- and intra-sentential relationships, and examine the amount of training data needed to learn various relationships. CONCLUSION: We have shown that it is possible to extract important clinical relationships from text, using supervised statistical ML techniques, at levels of accuracy approaching those of human annotators. Given the importance of relation extraction as an enabling technology for text mining and given also the ready adaptability of systems based on our supervised learning approach to other clinical relationship extraction tasks, this result has significance for clinical text mining more generally, though further work to confirm our encouraging results should be carried out on a larger sample of narratives and relationship types.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Algoritmos , Humanos , Sistemas Computadorizados de Registros Médicos
3.
AMIA Annu Symp Proc ; : 625-9, 2007 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-18693911

RESUMO

The Clinical E-Science Framework (CLEF) project is building a framework for the capture, integration and presentation of clinical information: for clinical research, evidence-based health care and genotype-meets-phenotype informatics. A significant portion of the information required by such a framework originates as text, even in EHR-savvy organizations. CLEF uses Information Extraction (IE) to make this unstructured information available. An important part of IE is the identification of semantic entities and relationships. Typical approaches require human annotated documents to provide both evaluation standards and material for system development. CLEF has a corpus of clinical narratives, histopathology reports and imaging reports from 20 thousand patients. We describe the selection of a subset of this corpus for manual annotation of clinical entities and relationships. We describe an annotation methodology and report encouraging initial results of inter-annotator agreement. Comparisons are made between different text sub-genres, and between annotators with different skills.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Sistemas Computadorizados de Registros Médicos , Processamento de Linguagem Natural , Humanos , Semântica
4.
Comp Funct Genomics ; 6(1-2): 86-93, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-18629294

RESUMO

Recent years have seen a huge increase in the amount of biomedical information that is available in electronic format. Consequently, for biomedical researchers wishing to relate their experimental results to relevant data lurking somewhere within this expanding universe of on-line information, the ability to access and navigate biomedical information sources in an efficient manner has become increasingly important. Natural language and text processing techniques can facilitate this task by making the information contained in textual resources such as MEDLINE more readily accessible and amenable to computational processing. Names of biological entities such as genes and proteins provide critical links between different biomedical information sources and researchers' experimental data. Therefore, automatic identification and classification of these terms in text is an essential capability of any natural language processing system aimed at managing the wealth of biomedical information that is available electronically. To support term recognition in the biomedical domain, we have developed Termino, a large-scale terminological resource for text processing applications, which has two main components: first, a database into which very large numbers of terms can be loaded from resources such as UMLS, and stored together with various kinds of relevant information; second, a finite state recognizer, for fast and efficient identification and mark-up of terms within text. Since many biomedical applications require this functionality, we have made Termino available to the community as a web service, which allows for its integration into larger applications as a remotely located component, accessed through a standardized interface over the web.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA